29 research outputs found
Speech-to-text technology for hard-of-hearing people
Hard-of-hearing people face challenges in daily interactions that involve spoken language, such as meetings or doctor’s visits. Automatic speech recognition technology can support them by providing a written transcript of the conversation. Pro Audito Schweiz, the Swiss federation of hard-of-hearing people, and the Centre for Artificial Intelligence (CAI) at the Zurich University of Applied Sciences (ZHAW) conducted a preliminary study into the use of Speech-to-Text (STT) for this target group. Our survey among the members of Pro Audito found that there is large interest in using automated solutions for better understanding in everyday situations. We now propose to take the next step and develop an application which uses ZHAW’s high-quality STT models
Dialect Transfer for Swiss German Speech Translation
This paper investigates the challenges in building Swiss German speech
translation systems, specifically focusing on the impact of dialect diversity
and differences between Swiss German and Standard German. Swiss German is a
spoken language with no formal writing system, it comprises many diverse
dialects and is a low-resource language with only around 5 million speakers.
The study is guided by two key research questions: how does the inclusion and
exclusion of dialects during the training of speech translation models for
Swiss German impact the performance on specific dialects, and how do the
differences between Swiss German and Standard German impact the performance of
the systems? We show that dialect diversity and linguistic differences pose
significant challenges to Swiss German speech translation, which is in line
with linguistic hypotheses derived from empirical investigations
Overview of the GermEval 2020 shared task on Swiss German language identification
In this paper, we present the findings of the Shared Task on Swiss German Language Identification organised as part of the 7th edition of GermEval, co-locatedwith SwissText and KONVENS 2020
Missing information, unresponsive authors, experimental flaws : the impossibility of assessing the reproducibility of previous human evaluations in NLP
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP
Dialect transfer for Swiss German speech translation
This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by two key research questions: how does the inclusion and exclusion of dialects during the training of speech translation models for Swiss German impact the performance on specific dialects, and how do the differences between Swiss German and Standard German impact the performance of the systems? We show that dialect diversity and linguistic differences pose significant challenges to Swiss German speech translation, which is in line with linguistic hypotheses derived from empirical investigations
ZHAW-InIT : social media geolocation at VarDial 2020
We describe our approaches for the Social Media Geolocation (SMG) task at the VarDial Evaluation Campaign 2020. The goal was to predict geographical location (latitudes and longitudes) given an input text. There were three subtasks corresponding to German-speaking Switzerland (CH), Germany and Austria (DE-AT), and Croatia, Bosnia and Herzegovina, Montenegro and Serbia (BCMS). We submitted solutions to all subtasks but focused our development efforts on the CH subtask, where we achieved third place out of 16 submissions with a median distance of 15.93 km and had the best result of 14 unconstrained systems. In the DE-AT subtask, we ranked sixth out of ten submissions (fourth of 8 unconstrained systems) and for BCMS we achieved fourth place out of 13 submissions (second of 11 unconstrained systems)
STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
We present STT4SG-350 (Speech-to-Text for Swiss German), a corpus of Swiss
German speech, annotated with Standard German text at the sentence level. The
data is collected using a web app in which the speakers are shown Standard
German sentences, which they translate to Swiss German and record. We make the
corpus publicly available. It contains 343 hours of speech from all dialect
regions and is the largest public speech corpus for Swiss German to date.
Application areas include automatic speech recognition (ASR), text-to-speech,
dialect identification, and speaker recognition. Dialect information, age
group, and gender of the 316 speakers are provided. Genders are equally
represented and the corpus includes speakers of all ages. Roughly the same
amount of speech is provided per dialect region, which makes the corpus ideally
suited for experiments with speech technology for different dialects. We
provide training, validation, and test splits of the data. The test set
consists of the same spoken sentences for each dialect region and allows a fair
evaluation of the quality of speech technologies in different dialects. We
train an ASR model on the training set and achieve an average BLEU score of
74.7 on the test set. The model beats the best published BLEU scores on 2 other
Swiss German ASR test sets, demonstrating the quality of the corpus
CEASR : a corpus for evaluating automatic speech recognition
In this paper, we present CEASR, a Corpus for Evaluating ASR quality. It is a data set derived from public speech corpora, containing manual transcripts enriched with metadata along with transcripts generated by several modern state-of-the-art ASR systems. CEASR provides this data in a unified structure, consistent across all corpora and systems with normalised transcript texts and metadata.
We then use CEASR to evaluate the quality of ASR systems on the basis of their Word Error Rate (WER). Our experiments show, among other results, a substantial difference in quality between commercial versus open-source ASR tools and differences up to a factor of ten for single systems on different corpora. By using CEASR, we could very efficiently and easily obtain these results. This shows that our corpus enables researchers to perform ASR-related evaluations and various in-depth analyses with noticeably reduced effort: without the need to collect, process and transcribe the speech data themselves
ZHAW-InIT at GermEval 2020 task 4 : low-resource speech-to-text
This paper presents the contribution of ZHAW-InIT to Task 4 ”Low-Resource STT” at GermEval 2020. The goal of the task is to develop a system for translating Swiss German dialect speech into Standard German text in the domain of parliamentary debates. Our approach is based on Jasper, a CNN Acoustic Model, which we fine-tune on the task data. We enhance the base system with an extended Language Model containing in-domain data and speed perturbation and run further experiments with post-processing. Our submission achieved first place with a final Word Error Rate of 40.29%